Appl. No. 10/632,431 



Attorney Docket : P15449 



Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. 

1. (Currently Amended) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
instruction , wherein the first processor is to be associated with a first private cache ; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent instruction 
and wherein the second processor is to be associated with a second private cache ; 

whoroin said first and second processors each include a private data cache; 

a shared memory system coupled to said first processor and to said second processor; and 

control logic to push retrieve , responsive to a miss of requested data for the delinquent 
instruction in the second private cache associated with [[of]] the second processor, the 
requested data from the shared memory system; the logic further to provide the requested 
data to the first private data cache associated with ef the first processor. 



2. (Currently Amended) The apparatus of claim 1 , wherein: the first processor, 

second processor and logic are included within a chip package , and wherein the 
shared memory system includes a shared cache . 
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3. (Currently Amended) The apparatus of claim 1, further comprising retrieval logic 

coupled to the control logic to retrieve the requested data from the shared memory 
system in response to the miss of the requested data for the delinquent instruction 
in the second private cache, wherein: the shared memory system includes a shared 
cache . 



4. (Currently Amended) The apparatus of claim 3, wherein: the control logic to 

push, responsive to a miss of requested data for the delinquent instruction in the 
second private cache, the requested data to the first private data cache of the first 
processor comprises the control logic, responsive to the miss of the requested data 
and the retrieval logic retrieving the requested data, to broadcast the requested 
data to an affinity group of processors including at least the first processor, the 
shared memory system includes a second shared cache . 



5. (Currently Amended) The apparatus of claim 3, wherein: the control logic to 

push, responsive to a miss of requested data for the delinquent instruction in the 
second private cache, the requested data to the first private data cache of the first 
processor comprises the control logic, responsive to the miss of the requested data 
and the retrieval logic retrieving the requested data, to unicast the requested data 
to the first processor, the shared cache is included within a chip package . 
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6. (Currently Amended) The apparatus of claim 1, wherein: the control logic is 
further to provide the requested data from the shared memory system to the 
second private data cache associated with ef the second processor. 

7. (Currently Amended) A processor comprising: The apparatus of claim 1 , 
wherein: said first and second processors are included in a plurality of n 
processors, where n > 2; each of said plurality of processors is coupled to the 
shared memory system; and each of said n plurality of processors includes a 
private data cache. 

a first processor core to be associated with a first private cache and to execute a 
first thread; 

a second processor core to be associated with a second private cache and to 
execute a second speculative thread, the second speculative thread to 
include a load instruction from the first thread that a miss to the first 
private cache is anticipated during execution of the first thread; and 

logic to retrieve data for the load instruction from a higher level memory and to 

provide the data to the first private cache associated with the first processor 
core in response to a miss to the second private cache responsive to 
execution of the load instruction in the second speculative thread with the 
second processor core. 
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8. (Currently Amended) The processor apparatus of claim 7, wherein: the logic to 

retrieve data for the load instruction from a higher level memory and to provide 
the data to the first private cache comprises in response to a miss to the second 
private cache responsive to execution of the load instruction in the second 
speculative thread with the second processor core comprises the logic to 

retrieve the data in response to the miss to the second private cache 
responsive to execution of the load instruction in the second 
speculative thread with the second processor, and 

broadcast the data to private caches associated with an affinity group of 
processor cores, the affinity group of processor cores including at 
least the first processor core, is furth e r to provide th e r e qu e st e d 
data from the shared memory system to each of th e n privat e data 
caches. 
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9. (Currently Amended) The processor apparatus of claim 7, wherein: the logic to 
retrieve data for the load instruction from a higher level memory and to provide 
the data to the first private cache comprises in response to a miss to the second 
private cache responsive to execution of the load instruction in the second 
speculative thread with the second processor core comprises the logic to 

retrieve the data in response to the miss to the second private cache 
responsive to execution of the load instruction in the second 
speculative thread with the second processor, and 

unicast the data to the first private cache associated with the first private 
core, the logic is further to provide the requested data from the 
shared m e mory system to a subs e t of the n private data cach e s, th e 
subset including x of the n private data caches, wh e r e 0 < x < n. 

10. (Currently Amended) The processor apparatus of claim 7 [|T"|"|, wherein: the 
higher-level memory includes a memory coupled external to the processor, and 
wherein the first processor core is further to trigger the second processor core 's 
execution of the second speculative thread helper thread instruction stream 
responsive to execution of a trigger instruction in the first main thread with the 
first processor core instruction stream . 
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1 1 . (Currently Amended) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
load instruction , wherein the first processor is to be associated with a first data 
cache ; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent load 
instructio n, and wherein the second processor is to be associated with a second 
data cache ; 

wherein said first and second processors each include a private data cache; and 

retrieval logic to retrieve, responsive to a miss of requested data for the delinquent load 
instruction in a first on e of the second private data cache[[s]], the requested data; 
and 

control logic to provide the requested data to the first data cache associated with the first 
processor in response to the retrieval logic retrieving the requested data and 
without a request from the first processor. 

from the other private data cache if said requested data is available in the other private 
data cache; 

the logic further to provide the requested data to the first private data cache. 
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12. (Currently Amended) The apparatus of claim 11, further comprising: a third 
processor and a shared memory system coupled to said first processor, and to said 
second processor , and said third processo r ; said logic is further to retrieve the 
requested data from the shared memory system if the requested data is not 
available in the other private data cache . 

13. (Currently Amended) The apparatus of claim 12 [[1111, wherein : retrieval logic 
to retrieve, responsive to a miss of requested data for the delinquent load 
instruction in the second data cache, the requested data comprises : 

the retrieval logic, responsive to the miss of requested data for the delinquent load 
instruction in the second data cache, to issue a snoop for the requested 
data to a third data cache associated with the third processor, and 

the third data cache to provide the requested data to the retrieval logic in response 
to receiving the snoop issued from the retrieval logic, the logic is 
included within an interconnect, whoroin the interconnect is to provide 
networking logic for communication among the first processor, the 
second processor, and the shared memory system. 
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14. (original) The apparatus of claim 13, wherein: the retrieval logic to retrieve, 

responsive to a miss of requested data for the delinquent load instruction in the 
second data cache, the requested data further comprises : 

the retrieval logic, responsive to the miss of requested data for the delinquent load 
instruction in the second data cache, to request the requested data from 
the shared memory system concurrently with issuing the snoop to the 
third data cache; and 

the retrieval logic to receive the requested data from the shared memory system in 
response to a miss of the requested data in the third data cache. 
the first and second processor are each included in a plurality of n processors; and the 
int e rconn e ct is furth e r to concurr e ntly broadcast a r e qu e st for the requested data to e ach of 
th e n proc e ssors and to th e shared m e mory system. 



15. (Currently Amended) The apparatus of claim 12 [[1 1 ]], wherein: the first, 

second, and third processors each include a processor core on a single package, 
and wherein the shared memory system includes a shared higher-level cache. 
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16. (Cancelled) 

17. (Original) The apparatus of claim 11, wherein: the first processor is further to 
trigger the second processor's execution of the helper thread instruction stream 
responsive to a trigger instruction in the main thread instruction stream 
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18. (Currently Amended) A method comprising: 

executing a main thread including a load instruction with a first core, the first core 
associated with a first private cache; 

determining that a helper core has suffered a miss ing in a second private cache associated 
with a second core in response to executing the fer-a load instruction within a helper thread 
with the second core while executing a helper thread ; and 

prefetching load data for the load instruction into the first [[a]] private cache of a main 
eere responsive to missing the second private cache in response to executing the load 
instruction within the helper thread with the second core . 

19. (Currently Amended) The method of claim 18, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

retrieving the load data from a shared memory system; and 

broadcasting the load data to an affinity group of cores including at least the first core in 
response to retrieving the load data from the shared memory system. 

providing the load data to the private cache of the main core . 

20. (Currently Amended) The method of claim 18, further comprising: 
providing the load data for the load instruction from a shared memory system into the 

second private cache associated with the second of the helper core. 
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21 . (Currently Amended) The method of claim 18, further comprising: 
providing the load data for the load instruction from a shared memory system into a the 

private cache for each of a plurality of helper cores including the second core . 

22. (Currently Amended) The method of claim 1 8, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

retrieving the load data from a third private cache associated with ef a third helper core; 

and 

providing the load data to the private cache associated with [[of]] the first main core in 
response to retrieving the load data from a third private cache . 

23. (Currently Amended) The method of claim 1 8, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: concurrently: 

broadcasting a request for the load data to each of a plurality of cores; and 

requesting the load data from a shared memory system. 
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24. (Currently Amended) The method of claim 23, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

providing, if the load data is available in a private cache of one of the plurality of 
cores, the load data to the main core from the private cache of one of the plurality of 
cores ; and 

providing, if the load data is not available in a private cache of one of the plurality of 
cores, the load data to the main core from the shared memory system. 

25. (Currently Amended) An article comprising: 

a machine-readable storage medium having a plurality of machine accessible instructions, 
which if executed by a machine, cause the machine to perform operations comprising: 

determining that a helper core has suffered a miss in a helper private cache for a load 
instruction from a main thread executing on a main core while executing a helper thread; 
and 

prefetching fetching load data for the load instruction in response to determining the 
helper core has suffered the miss in the helper private cache for the load instruction; and 

pushing, unsolicited from the main core, the load data into a main private cache of the 
a main core in response to fetching the load data for the load instruction . 
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26. (Currently Amended) The article of claim 25, wherein: fetching load data for the 

load instruction in response to determining the helper core has suffered the miss 
in the helper private cache for the load instruction comprises: the instructions that 
cause the machine to prefetch load data further comprise instructions that cause 
the machine to : retrieving!" |"e~|1 the load data from a shared memory system in 
response to determining the helper core has suffered the miss in the helper private 
cache for the load instruction ; and wherein pushing, unsolicited from the main 
core, the load data into a main private cache of the main core in response to 
fetching the load data for the load instruction comprises: provide broadcasting the 
load data to an affinity group of private caches of an affinity group of cores 
including the main private cache of the main core, to the private cache of th e main 

VATlXT 
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27. (Currently Amended) The article of claim 25, : fetching load data for the load 
instruction in response to determining the helper core has suffered the miss in the 
helper private cache for the load instruction comprises: retrievingthe load data 
from a shared memory system in response to determining the helper core has 
suffered the miss in the helper private cache for the load instruction; and wherein 
pushing, unsolicited from the main core, the load data into a main private cache of 
the main core in response to fetching the load data for the load instruction 
comprises: unicasting the load data to the main private cache of the main 
core. further comprising: a plurality of machine accessible instructions, which if 
executed by a machine, cause the machine to perform operations comprising: 
providing load data for the load instruction from a shared memory syst e m into th e 
privat e cach e of th e helper core. 

28. (Currently Amended) The article of claim 25, further comprising: a plurality of 
machine accessible instructions, which if executed by a machine, cause the 
machine to perform operations comprising: providing load data for the load 
instruction from a shared memory system into the private cache for each of a 
plurality of helper cores including the helper private cache of the helper core . 
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29. (Currently Amended) The article of claim 25, wherein: fetching load data for the 
load instruction in response to determining the helper core has suffered the miss 
in the helper private cache for the load instruction comprises: the instructions that 
cause the machine to prefetch load data further comprise instructions that cause 
the machine to : retrieving[[e]] the load data from a n additional helper private 
cache of a n additional helper core ; and provide the load data to the private cache 
of the main core . 

30. (Cancelled): 

31. (Cancelled) 

32. (Currently Amended) A system comprising: 

a memory system that includ e s a dynamic random acc e ss memory ; 

a first processor, coupled to the memory system, to execute a first instruction stream; 

a second processor, coupled to the memory system, to concurrently execute a second 
instruction stream; and 

helper threading logic to push provide fill data to a first private cache of the first 
processor, the fill data to be prefetched by the second processor and to be pushed to the first 
private cache before a request for the fill data is issued by the first processor, to the first 
processor. 
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33. (Cancelled) 



34. (Currently Amended) The system of claim 32, wherein: the helper threading 

logic is further to push provide the fill data to the first private cache of the first 
processor from a second private cache of the second processor. 



35. (Currently Amended) The system of claim 32, wherein: the helper threading 

logic is further to push provide the fill data to the first private cache of the first 
processor from the memory system. 



36. (Currently Amended) The system of claim 32, wherein the first processor and the 

second processor are each processor cores, and wherein the first processor core, 
the second processor core, and the helper threading logic are included in a single 
processor package coupled to the memory system, further comprising: an 
interconnect that manages communication between the first and second 
processors . 



37. (Currently Amended) The system of claim 36 [[32]], wherein: the memory 

system includes a cache that is shared by the first and second processo r core s 
coupled to a dynamic random access memory . 
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